Think Globally, Apply Locally: Using Distributional Characteristics for Hindi Named Entity Identification
نویسندگان
چکیده
In this paper, we present a novel approach for Hindi Named Entity Identification (NEI) in a large corpus. The key idea is to harness the global distributional characteristics of the words in the corpus. We show that combining the global distributional characteristics along with the local context information improves the NEI performance over statistical baseline systems that employ only local context. The improvement is very significant (about 10%) in scenarios where the test and train corpus belong to different genres. We also propose a novel measure for NEI based on term informativeness and show that it is competitive with the best measure and better than other well known information measures.
منابع مشابه
Mining Transliterations from Wikipedia using Dynamic Bayesian Networks
Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipe...
متن کاملTowards Language Independent NE Identification in the context of Wikipedia
Named Entity Identification/Recognition is a key component for most Information Extraction tasks. All the existing approaches for NEI use extensive language specific resources. This paper deals with the problem of Multi lingual Named Entity Identification (NEI), and explains the need to address this problem in a language independent fashion. In this work we focus on Less Resourced languages lik...
متن کاملA Distributional Semantics Approach to Simultaneous Recognition of Multiple Classes of Named Entities
Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al’s...
متن کاملA Hybrid Approach of English- Hindi Named-entity Transliteration
In recent years, machine transliteration has gained a center of attention for research. Both machine translation and transliteration are important for e-governance and web based online multilingual applications. As machine translation translate source language to target language which results in wrong translation for named entities. Named entities are required to be translated with preserving t...
متن کاملNamed Entity Recognition for South Asian Languages
Much work has already been done on building named entity recognition systems. However most of this work has been concentrated on English and other European languages. Hence, building a named entity recognition (NER) system for South Asian Languages (SAL) is still an open problem because they exhibit characteristics different from English. This paper builds a named entity recognizer which also i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010